38 research outputs found

    A -1.8V to 0.9V body bias, 60 GOPS/W 4-core cluster in low-power 28nm UTBB FD-SOI technology

    Get PDF
    A 4-core cluster fabricated in low power 28nm UTBB FD-SOI conventional well technology is presented. The SoC architecture enables the processors to operate 'on-demand' on a 0.44V (1.8MHz) to 1.2V (475MHz) supply voltage wide range and -1.2V to 0.9V body bias wide range achieving the peak energy efficiency of 60 GOPS/W, (419\u3bcW, 6.4MHz) at 0.5V with 0.5V forward body bias. The proposed SoC energy efficiency is 1.4x to 3.7x greater than other low-power processors with comparable performance

    Approximate 32-Bit Floating-Point Unit Design with 53% Power-Area Product Reduction

    Get PDF
    The floating-point unit is one of the most common building block in any computing system and is used for a huge number of applications. By combining two state-of-the-art techniques of imprecise hardware, namely Gate-Level Pruning and Inexact Speculative Adder, and by introducing a novel Inexact Speculative Multiplier architecture, three different approximate FPUs and one reference IEEE-754 compliant FPU have been integrated in a 65 nm CMOS process within a low-power multi-core processor. Silicon measurements show up to 27% power, 36% area and 53%power-area product savings compared to the IEEE-754 single-precision FPU. Accuracy loss has been evaluated with a high-dynamic-range image tone-mapping algorithm, resulting in small but non-visible errors with image PSNR value of 90 dB

    Investigating the Potential of Custom Instruction Set Extensions for SHA-3 Candidates on a 16-bit Microcontroller Architecture

    Get PDF
    In this paper, we investigate the benefit of instruction set extensions for software implementations of all five SHA-3 candidates. To this end, we start from optimized assembly code for a common 16-bit microcontroller instruction set architecture. By themselves, these implementations provide reference for complexity of the algorithms on 16-bit architectures, commonly used in embedded systems. For each algorithm, we then propose suitable instruction set extensions and implement the modified processor core. We assess the gains in throughput, memory consumption, and the area overhead. Our results show that with less than 10% additional area, it is possible to increase the execution speed on average by almost 40%, while reducing memory requirements on average by more than 40%. In particular, the Grostl algorithm, which was one of the slowest algorithms in previous reference implementations, ends up being the fastest implementation by some margin, once minor (but dedicated) instruction set extensions are taken into account

    Real-time high-sensitivity impedance measurement interface for tethered BLM biosensor arrays

    Get PDF
    This paper presents a switched-capacitor (SC) current integrator circuit for impedance measurement of tethered bilayer lipid membrane (tBLM) biosensors. The circuit comprises a small number of high performance components enabling enhanced experimental flexibility and reliability. The sensitivity is improved significantly by suppressing the output offset through pseudo-differential operation, using R-C components for the reference impedance. The sensing and reference electrodes are excited with low-amplitude differential voltage pulses and the current response to membrane resistance (RM) change of the tBLM biosensor is converted to voltage by a precision, low-noise SC integrator available as a single-package IC. Tests with both electrical models and actual biosensors demonstrated that the proposed circuit operates with high sensitivity and can be used in single chip versions for low-cost and high-sensitive tBLM biosensor arrays, featuring multiple electrode sites

    Breaking ECC2K-130

    Get PDF
    Elliptic-curve cryptography is becoming the standard public-key primitive not only for mobile devices but also for high-security applications. Advantages are the higher cryptographic strength per bit in comparison with RSA and the higher speed in implementations. To improve understanding of the exact strength of the elliptic-curve discrete-logarithm problem, Certicom has published a series of challenges. This paper describes breaking the ECC2K-130 challenge using a parallelized version of Pollard\u27s rho method. This is a major computation bringing together the contributions of several clusters of conventional computers, PlayStation~3 clusters, computers with powerful graphics cards and FPGAs. We also give /preseestimates for an ASIC design. In particular we present * our choice and analysis of the iteration function for the rho method; * our choice of finite field arithmetic and representation; * detailed descriptions of the implementations on a multitude of platforms: CPUs, Cells, GPUs, FPGAs, and ASICs; * details about running the attack

    Novel Front-End Circuit Architectures for Integrated Bio-Electronic Interfaces

    No full text
    The prospective use of upcoming nanometer CMOS technology nodes (65nm, 45nm, and beyond) in bio-electronic interfaces is raising a number of important issues concerning circuit architectures and design. In particular, the advantages of scaling and higher density integration must be balanced against the requirements of low noise design, uniform power density and surface temperature distribution, better component matching, and immunity to parameter variations. Dealing with these constraints also requires more innovative approaches towards hybrid integration technologies. In this paper, we discuss the key design issues with specific examples from DNA detection, protein detection, and neuro-electronic interfaces

    4.6 A 65nm CMOS 6.4-to-29.2pJ/[email protected] shared logarithmic floating point unit for acceleration of nonlinear function kernels in a tightly coupled processor cluster

    No full text
    Energy-efficient computing and ultra-low-power operation are requirements for many application areas, such as IoT and wearables. While for some applications, integer and fixed-point processor instructions suffice, others (e.g. simultaneous localization and mapping - SLAM, stereo vision, nonlinear regression and classification) require a larger dynamic range, typically obtained using single/double-precision floating point (FP) instructions. Logarithmic number systems (LNS) have been proposed [1,2] as an energy-efficient alternative to conventional FP, as several complex operations such as MUL, DIV, and EXP translate into simpler arithmetic operations in the logarithmic space and can be efficiently calculated using integer arithmetic units. However, ADD and SUB become nonlinear and have to be approximated by look-up tables (LUTs) and interpolation, which is typically implemented in a dedicated LNS unit (LNU) [1,2]. The area of LNUs grows exponentially with the desired precision, and an LNU with accuracy comparable to IEEE single-precision format is larger than a traditional floating-point unit (FPU). However, we show that in multi-core systems optimized for ultra-low-power operation such as the PULP system [3], one LNU can be efficiently shared in a cluster as indicated in Fig. 4.6.1. This arrangement not only reduces the per-core area overhead, but more importantly, allows several costly operations such as FP MUL/DIV to be processed without contention within the integer cores without additional overhead. We show that for typical nonlinear processing tasks, our LNU design can be up to 4.2 7 more energy efficient than a private-FP design

    Efficient and side-channel-aware implementations of elliptic curve cryptosystems over prime fields

    No full text
    Elliptic curve cryptosystems (ECCs) are utilised as an alternative to traditional public-key cryptosystems, and are more suitable for resource-limited environments because of smaller parameter size. In this study, the authors carry out a thorough investigation of side-channel attack aware ECC implementations over finite fields of prime characteristic including the recently introduced Edwards formulation of elliptic curves. The Edwards formulation of elliptic curves is promising in performance with built-in resiliency against simple side-channel attacks. To our knowledge the authors present the first hardware implementation for the Edwards formulation of elliptic curves. The authors also propose a technique to apply non-adjacent form (NAF) scalar multiplication algorithm with side-channel security using the Edwards formulation. In addition, the authors implement Joye's highly regular add-always scalar multiplication algorithm both with the Weierstrass and Edwards formulation of elliptic curves. Our results show that the Edwards formulation allows increased area-time performance with projective coordinates. However, the Weierstrass formulation with affine coordinates results in the simplest architecture, and therefore has the best area-time performance as long as an efficient modular divider is available
    corecore